Enhanced Regular Expression as a DGL for Generation of Synthetic Big Data

Kai Cheng; Keisuke Abe

연구문헌

영문 논문지

홈 > 연구문헌 > 영문 논문지 > JIPS (한국정보처리학회)

JIPS (한국정보처리학회)

Current Result Document :

한글제목(Korean Title)	Enhanced Regular Expression as a DGL for Generation of Synthetic Big Data
영문제목(English Title)	Enhanced Regular Expression as a DGL for Generation of Synthetic Big Data
저자(Author)	Kai Cheng Keisuke Abe
원문수록처(Citation)	VOL 19 NO. 01 PP. 0001 ~ 0016 (2023. 02)
한글내용 (Korean Abstract)
영문내용 (English Abstract)	Synthetic data generation is generally used in performance evaluation and function tests in data-intensive applications, as well as in various areas of data analytics, such as privacy-preserving data publishing (PPDP) and statistical disclosure limit/control. A significant amount of research has been conducted on tools and languages for data generation. However, existing tools and languages have been developed for specific purposes and are unsuitable for other domains. In this article, we propose a regular expression-based data generation language (DGL) for flexible big data generation. To achieve a general-purpose and powerful DGL, we enhanced the standard regular expressions to support the data domain, type/format inference, sequence and random generation, probability distributions, and resource reference. To efficiently implement the proposed language, we propose caching techniques for both the intermediate and database queries. We evaluated the proposed improvement experimentally.
키워드(Keyword)	Big Data Analytics Data Generation Language (DGL) Performance Analysis Regular Expression Synthetic Data Generation Type/format Inference
파일첨부	PDF 다운로드